Read me data

To undertake the replication of the analyses in the paper, "Oil 'Rents' and Political Development: What Do We Really Know about the Curse of Natural Resources?"
several steps need to be undertaken.

Initial Source Data: 

Penn Energy Research/Oil & Gas Journal. 2014. "Worldwide Oil Field Production Survey: Historical Version. Data from 1980-2012." These data are proprietary and cannot
be directly provided by us under the licensing conditions we have agreed to. They are available at a reasonable price to non-profit educational entities from Penn Energy,
and much of the information should be available in the print version of the Oil & Gas Journal, on an annual basis, without cost beyond access to that journal. 

Steps taken before imputation:

1. The source data (in an excel file) is stripped of extraneous lines, national subtotals, and organized by country and oilfield and year.
2. There are oilfields that do not have unique names. Letters were appended to duplicate field names (as fully described in the online appendix) to render them unique,
starting with "a", "b", "c", etc. Similarly, oilfields identified as existing in Yugoslavia prior to 1991 are typically identified as being in Croatia after 1991.
These were coded as being in Croatia for the entire dataset period so that each field would be represented by a single set of entries (one per year) consistently throughout
the dataset. Descriptions of this data cleaning are also in the online Appendix.
3. Once transformed into this structure, the data were saved in .csv format in the file "oil_32_data_cleaned."




There are two programs (programs folder) and three folders to clean and process the data for analysis. 

1. master_file.r loads the data checks for users missing R packages, and runs the imputation program (see below). 
2. (1)_imputation_oil_replication.r is called in first by master_file.r. It imputes missing values by country using random forest. It then exports and merges the data 
to imputed_values folder. the input file is "oil_32_data_cleaned," the output is: "oil_data_32_imputed.rda"
3. (2)_oil_prices_1980_2013.R. This is the program that matches oil price information to the characteristics of each oilfield in the data. The price data is available
and can be found in the oil_prices subfolder of this replication folder. The code calculates the total dollar value (in inflation adjusted dollars) of oil output for each
oilfield. The input file is "oil_data_32_imputed_rda" and the oil price information (see oil_prices folder). All oilfield, price, and oil value data are then saved in a file call "oil_final.rata" as well as a stata 15 file, "oil_final.dta," which will be used for
further processing and aggregation.
4. (3)_create_difficulties_and_aggregate_to_country_level is the Stata 15 do file that calculates the alternative specifications of the difficulty score for each
oilfield (using different weights, different standardization of index components, etc.). This dataset is then aggregated such that it creates properly weighted averages
for each difficulty, drop observations from postcommunist countries from before 1991 (i.e., no data for Kyrgyzstan before it was independent). Country names are corrected
for spelling errors as well as standardization to permit merging with political and other variables. The regions of the United Arab Emirates are also aggregated to a single
country entry (to match other variables), rather than each emirate separately. Finally the data are stored in the file "oil_final.dta". After being stripped of unused
variables from earlier stages of analysis, the resulting file is saved as "replication_data.rdata."
5. (4)_country_level_replication_kurtz_brooks.r is the file that takes "replication_data.rdata" as an input and produces all analyses referenced in the paper. It also 
reproduces all graphical representations of interaction effects. The country-level file "replication_data.rdata" is included for analysis in this replication archive.

 

